Skip to content

Add differential testing harness #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 4, 2025
Merged

Conversation

LegNeato
Copy link
Collaborator

@LegNeato LegNeato commented Feb 4, 2025

The test harness compiles and runs Rust
shaders/kernels and compares the output with those produced by equivalent WGSL
shaders/kernels. Unlike traditional reference, golden, or snapshot tests—which check
against static outputs—differential testing compares multiple independent implementations of the same logic.
This approach is more robust because discrepancies between them are more likely to
indicate real bugs rather than issues with outdated reference files.

You can run difftests via cargo difftest. This is an alias set up in .cargo/config
for cargo run --release -p difftest --. You can filter to run specific tests by
passing the (partial) filenames to cargo difftest some_file_name.

@LegNeato LegNeato marked this pull request as ready for review February 6, 2025 23:29
@LegNeato

This comment was marked as outdated.

@LegNeato
Copy link
Collaborator Author

Perhaps better DX is to have directories and just cargo run --release in each?

@LegNeato
Copy link
Collaborator Author

@Firestar99 FYI, I am going to revamp this a bit as it is a bit of a mess.

@LegNeato LegNeato marked this pull request as draft March 13, 2025 15:37
@LegNeato
Copy link
Collaborator Author

FWIW, I am almost ready to update this. It is much cleaner now :-)

@LegNeato LegNeato force-pushed the difftests branch 2 times, most recently from 2b484a8 to 4408a6e Compare March 17, 2025 01:17
@LegNeato LegNeato marked this pull request as ready for review March 17, 2025 01:21
@LegNeato
Copy link
Collaborator Author

LegNeato commented Mar 17, 2025

I think this is ready for review. Sorry it is so big! I can break it down if need be.

The main changes:

  1. Move compiletests under tests/ and add difftests there as well.
  2. Add a difftest harness
  3. Add a difftest sdk/library used by the binaries under test.
  4. Add a simple compute difftest test using wgpu (one variant written in rust, one in wgsl).
  5. Add to CI (on linux install software GPU so the tests will run)
  6. Some docs in a README.

@LegNeato
Copy link
Collaborator Author

This is what a run with passing and failing tests looks like:

Screenshot 2025-03-16 at 8 15 09 PM

On failing tests, the outputs are grouped as you may have 4 variants where 3 are the same and 1 is different. In that case the 3 same will be grouped in the output so it is easier to understand what is going on.

@LegNeato
Copy link
Collaborator Author

Another option we could do rather than binaries, is sort of what I did before (have built in harnesses which are configured) but mirror wgpu's harness and config format. Perhaps that would allow us to leverage their test corpus by just writing the rust-gpu versions of their test inputs.

If I get my naga rust-gpu backend working, we could generate the rust-gpu code on the fly from the source wgpu test 🤔

@LegNeato
Copy link
Collaborator Author

LegNeato commented Apr 2, 2025

This doubles CI time. I am sure there are things I can do to make CI faster (such as the deps_helper in compile test) but perhaps we should do that in followups?

@Firestar99
Copy link
Member

I pulled out the tests/* to tests/compiletests/* rename as a separate commit and rebased it for you.

In your previous rebase you forgot to move some of the new tests. I would like to fast-track that rename in #270 so it doesn't happen again.

@Firestar99
Copy link
Member

Firestar99 commented Jun 4, 2025

I really like the concept and I love that we can diff rust-gpu, wgsl and rust on cpu or whatever else we can think of.

I've played around with it a bit and noticed a few things:

  • I assume all difftests are in a different workspace to not pollute our root workspace and slow down cargo due to evaluating the potentially hundreds of cargo projects in difftests? Would love to see a sentence about it in the readme, there it just states it's done this way but not why it is.
  • One crate is called difftest and the other difftests, confusing!
  • compile times
    • FIXED: The slowest part of our project is compiling the c++ of spirv_tools, which we are building twice? Once to build difftest itself, run it and it building the tests builds it again? I assume we'd only need to compile it once, as we're only compiling rust shaders once as well.
    • Due to it being a different workspace, we can't share the spirv_tools or wgpu build with the main workspace. I tried setting the target dir to the same dir, didn't help, though the next issue may be affecting my results.
    • FIXED: Repeated runs still take 60s to run due to it recompiling something from scratch every time, as I see many rustc spinning up. This needs to be fixed if we ever want to iterate on bugs using difftests.
  • ci
    • ci is not checking whether the new Cargo.lock is up-to-date with cargo fetch --locked
    • Maybe we should split out difftests into a separate runner to speed up builds? Maybe do the same with compiletests? That I'd make a followup to allow for more experimentation.
    • FIXED: the difftests are not using feature use-installed-tools due to being build by difftests and not directly from ci

(continually adding to this)

@LegNeato
Copy link
Collaborator Author

LegNeato commented Jun 4, 2025

  • I assume all difftests are in a different workspace to not pollute our root workspace and slow down cargo due to evaluating the potentially hundreds of cargo projects in difftests? Would love to see a sentence about it in the readme, there it just states it's done this way but not why it is.

Yep, added text to that effect.

  • One crate is called difftest and the other difftests, confusing!

Yeah, I tried to match compiletest (they are called compiletests for the runner and https://crates.io/crates/compiletest_rs introduces a compiletest crate used by the runnner). Happy to change naming to make it less confusing.

  • Maybe we should split out difftests into a separate runner to speed up builds? Maybe do the same with compiletests? That I'd make a followup to allow for more experimentation.

Yes, I think we should. I was planning to do so in a followup. I wanted to keep this close to "as-is" so others can see where I just mirrored the existing comptiletest support.

@Firestar99
Copy link
Member

I'm working on the repeated runs, and noticed that there's some target_dir stomping going on. This is a clean run, and still the spirv-builder cargo run is marking crates that are supposedly clean as dirty:

     Running `tests/target/release/simple-compute-rust /tmp/.tmphUE36K`
   Compiling compiler_builtins v0.1.138
   Compiling core v0.0.0 (/home/firestar99/.rustup/toolchains/nightly-2024-11-22-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
       Dirty proc-macro2 v1.0.94: the rustflags changed
   Compiling proc-macro2 v1.0.94
   Compiling libm v0.2.11
       Dirty unicode-ident v1.0.18: the rustflags changed
   Compiling unicode-ident v1.0.18
       Dirty autocfg v1.4.0: the rustflags changed
   Compiling autocfg v1.4.0
   Compiling spirv-std-types v0.9.0 (/home/firestar99/workspace/frameworks/rust-gpu/crates/spirv-std/shared)

run3_clean.log

@LegNeato
Copy link
Collaborator Author

LegNeato commented Jun 4, 2025

We could provide pre-made binary crates that one can just drop files in for simple cases. I'd still want to keep the top-level "run cargo run --release" API for ultimate flexibility, but maybe we move some straightforward "stock" runners into the runner binary itself rather than the way I did it where they are a library used by the test binaries and everything is cargo run --release. Not sure which way is better 🤷

LegNeato and others added 6 commits June 4, 2025 16:24
This runs wgsl shaders and rust shaders and compares the output.
If the output differs, the test fails. Differential testing is better
than snapshot testing or golden file testing as there are no reference
files to get outdated.
@Firestar99
Copy link
Member

Firestar99 commented Jun 4, 2025

The stomping is happening cause spirv-builder can only find it's target_dir of target/spirv-builder when it has the "OUT_DIR" env var, which is only present if a build script is present. Otherwise, it defaults to not setting it and stomping with the cpu compile.
We have an overwrite for the target_dir directory, but it only works if spirv-builder found the target dir and then it makes it relative to that. If you were to pass in an absolute path, it would just use that, ignoring the resolved value, but still requiring that it could be resolved.
I fixed it by making spirv-builder "accept absolute target_dir_path and properly resolve relative ones with cargo_metadata", which does introduce another dependency, though we have cargo_metadata in cargo-gpu already.
From 60s of stomping we're now down to about 2s of cpu compile and 2s shader compile incrementally.

@Firestar99
Copy link
Member

Firestar99 commented Jun 4, 2025

I have one concern about the design of this system: If we look at the main functions of simple_compute_rust and *_wgsl, they are effectively the same, apart from the shader they use. With the current system, we'd need to ensure that the wgsl and rust versions dispatch the shader with the same hypercube, it could as well be the same main binary with a flag to select which shader to use. While copy-pasting those few lines may not be a huge deal, if we start to copy-paste a whole setup for a graphics pipeline, that's an issue.
Alternatively, we could have one crate depend on the other, but then the one crate's main would just be calling a single function by supplying it a different shader. Seems silly to have a crate just for that.
Instead, I'd propose to have a single crate per "test case" containing both rust and wgsl shaders. The amount of variants would have to be discovered by running the binary without any args, and with args it'll run that variant (just an enum?). The individual runs are still isolated from each other by rerunning the binary each time, while simplifying code sharing a lot. And if you wanted to add a CPU variant to the current system, you could just branch off the cpu variant before you get to the gpu variants.
You could even argue that every test could just be a binary within a single global crate.

@LegNeato
Copy link
Collaborator Author

LegNeato commented Jun 4, 2025

I assumed any code being shared by more than 2 tests would be put into the difftest crate for reuse. The reason I did it this was was for maximum control (let's say you wanted different deps or different rustflags, of even different entries in cargo.toml). But it could definitely be collapsed into one. I'm indifferent, I just wanted to make sure we could test MxN scenarios

Copy link
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine merging it as is, using it for a bit and see how it feels like. We can still refactor it later and adjust all our existing difftests as need arises.

Also tested my spirv-builder patch with cargo-gpu, results in the exact same directory structure as before. I should mention though that I changed the type of target_dir from String to PathBuf, but I'd wager noone apart from me is using that setting anyway.

@LegNeato LegNeato added this pull request to the merge queue Jun 4, 2025
Merged via the queue into Rust-GPU:main with commit 63f98bb Jun 4, 2025
7 checks passed
@LegNeato LegNeato deleted the difftests branch June 4, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants